Difference of Convex Functions Programming for Reinforcement Learning

نویسندگان

Bilal Piot

Matthieu Geist

Olivier Pietquin

چکیده

Large Markov Decision Processes are usually solved using Approximate Dynamic Programming methods such as Approximate Value Iteration or Approximate Policy Iteration. The main contribution of this paper is to show that, alternatively, the optimal state-action value function can be estimated using Difference of Convex functions (DC) Programming. To do so, we study the minimization of a norm of the Optimal Bellman Residual (OBR) T ∗Q − Q, where T ∗ is the so-called optimal Bellman operator. Controlling this residual allows controlling the distance to the optimal action-value function, and we show that minimizing an empirical norm of the OBR is consistant in the Vapnik sense. Finally, we frame this optimization problem as a DC program. That allows envisioning using the large related literature on DC Programming to address the Reinforcement Leaning problem.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Difference of Convex Functions Programming Applied to Control with Expert Data

This paper reports applications of Difference of Convex functions (DC) programming to Learning from Demonstrations (LfD) and Reinforcement Learning (RL) with expert data. This is made possible because the norm of the Optimal Bellman Residual (OBR), which is at the heart of many RL and LfD algorithms, is DC. Improvement in performance is demonstrated on two specific algorithms, namely Reward-reg...

متن کامل

Optimality and Duality for an Efficient Solution of Multiobjective Nonlinear Fractional Programming Problem Involving Semilocally Convex Functions

In this paper, the problem under consideration is multiobjective non-linear fractional programming problem involving semilocally convex and related functions. We have discussed the interrelation between the solution sets involving properly efficient solutions of multiobjective fractional programming and corresponding scalar fractional programming problem. Necessary and sufficient optimality...

متن کامل

On Sequential Optimality Conditions without Constraint Qualifications for Nonlinear Programming with Nonsmooth Convex Objective Functions

Sequential optimality conditions provide adequate theoretical tools to justify stopping criteria for nonlinear programming solvers. Here, nonsmooth approximate gradient projection and complementary approximate Karush-Kuhn-Tucker conditions are presented. These sequential optimality conditions are satisfied by local minimizers of optimization problems independently of the fulfillment of constrai...

متن کامل

Convex Generalized Semi-Infinite Programming Problems with Constraint Sets: Necessary Conditions

We consider generalized semi-infinite programming problems in which the index set of the inequality constraints depends on the decision vector and all emerging functions are assumed to be convex. Considering a lower level constraint qualification, we derive a formula for estimating the subdifferential of the value function. Finally, we establish the Fritz-John necessary optimality con...

متن کامل

Inequalities of Ando's Type for $n$-convex Functions

By utilizing different scalar equalities obtained via Hermite's interpolating polynomial, we will obtain lower and upper bounds for the difference in Ando's inequality and in the Edmundson-Lah-Ribariv c inequality for solidarities that hold for a class of $n$-convex functions. As an application, main results are applied to some operator means and relative operator entropy.

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

Difference of Convex Functions Programming for Reinforcement Learning

نویسندگان

چکیده

منابع مشابه

Difference of Convex Functions Programming Applied to Control with Expert Data

Optimality and Duality for an Efficient Solution of Multiobjective Nonlinear Fractional Programming Problem Involving Semilocally Convex Functions

On Sequential Optimality Conditions without Constraint Qualifications for Nonlinear Programming with Nonsmooth Convex Objective Functions

Convex Generalized Semi-Infinite Programming Problems with Constraint Sets: Necessary Conditions

Inequalities of Ando's Type for $n$-convex Functions

عنوان ژورنال:

اشتراک گذاری